Skip to contents

Today’s tutorial highlights progress to date through an “end to end” workflow using a real world example of (1) getting, (2) manipulating and (3) enriching a hydrofabric.

If you successfully complete this tutorial, you will create the minimal set of data files (and skills to experiment!) needed for the AWI datastream and NGIAB.

This tutorial can be followed from this webpage which has complete discussion and text surrounding the respective code chunks, or, from the companion R script that can be found here.

Before you jump into this, ensure you have your environment set up by installing R as detailed here # Getting Started

# Install -----------------------------------------------------------------
# install.packages("remotes") 
# install.packages("powerjoin") 
remotes::install_github("NOAA-OWP/hydrofabric")

Attach Package

# helper function to use throughout tutorial

make_map = function(file, pois) {
  hf = read_hydrofabric(file)
  mapview::mapview(hf$catchments) + hf$flowpaths + pois
}

### ---- Sample out files and source for today ---- ###
fs::dir_create("tutorial")

source    <- '/Users/mjohnson/hydrofabric/'

reference_file  <- "tutorial/poudre.gpkg"
refactored_file <- "tutorial/refactored.gpkg"
aggregated_file <- "tutorial/aggregated.gpkg"

nextgen_file       <- "tutorial/poudre_ng.gpkg"
model_atts_file    <- "tutorial/poudre_ng_attributes.parquet"
model_weights_file <- "tutorial/poudre_ng_weights.parquet"

Building a NextGen Hydrofabric

Get Reference Fabric (subsetting)

For this example, we want to prepare a NextGen hydrofabric and associated products for the area upstream of NWIS 06752260 that sits on the Cache La Poudre River in Fort Collins, Colorado. Feel free to use any particular location of your desire (as relevant to our subsetting tools), we’ve set up the Lynker-Spatial Hydrolocation Viewer to make finding an appropriate starting reference point of interest (POI) easier This POI will define the most downstream point in our network, and we’ll need to pull out and save (subset) the reaches which drain to this point in order to collect the network needed for the model domain.

The lynker-spatial hydrolocation inventory is both a subset and superset of the community POI set. Meaning, we use a subset of the community POIs, and add a selection needed for NextGen modeling. This include (but are not limited to) the NWS LIDs, Coastal/Terristrail instactions, NWM reservoirs and lakes, Coastal Gages, and more!

## ---  Define starting feature by source and ID
## https://waterdata.usgs.gov/monitoring-location/06752260
## https://reference.geoconnex.us/collections/gages/items?provider_id=06752260
# Use get_subset to build a reference subset

get_subset(
  hl_uri = "Gages-06752260",
  source  = using_local_example,
  type = "reference",
  hf_version = "2.2",
  lyrs = c("divides", "flowlines", "network"),
  outfile = reference_file,
  overwrite = TRUE
)
st_layers(reference_file)
## Driver: GPKG 
## Available layers:
##   layer_name geometry_type features fields             crs_name
## 1    divides       Polygon     1122      5 NAD83 / Conus Albers
## 2  flowlines   Line String     1129     19 NAD83 / Conus Albers
## 3    network            NA     1145     23                 <NA>

Get some Points of Interest

There are many locations on the network (e.g. dams, gage, etc.) that we want to ensure are preserved in a network manipulation. That means that no matter how a fabric is refactored or aggregated, key hydrolocations persist. This is critical to ensuring cross dataset interoperability, consistent data streams for assimilation and model coupling, and persistent nexus locations.

For this example well read all hydrolocations from the community POI set (GFv20), convert them to spatial points and keep only those within the reference subset domain.

hf = read_hydrofabric(reference_file)

pois = open_dataset(glue("{source}/v2.2/conus_hl")) %>%
  filter(hl_source == 'GFv20', 
         vpuid %in% unique(hf$flowpaths$vpuid),
         hf_id %in% hf$flowpaths$id) %>%
  collect() %>%
  st_as_sf(coords = c("X", "Y"), crs = 5070)
make_map(reference_file, pois)

Build a Refactored Fabric

The reference network provides the minimal discretization of the landscape and river network offered by this system. Derived from a traditional cartographic product, we need to remove small river segments that are to short for stable routing calculations and split long narrow catchments that have long flow paths. This process is known as refactoring and is describe in detail in the refactoring section here

refactored = refactor(
  reference_file,
  split_flines_meters = 10000,
  collapse_flines_meters = 1000,
  collapse_flines_main_meters = 1000,
  pois = pois,
  fac = '/vsis3/lynker-spatial/gridded-resources/fac.vrt',
  fdr = '/vsis3/lynker-spatial/gridded-resources/fdr.vrt',
  outfile = refactored_file
)
make_map(refactored_file, pois)

Build an Aggregated Network

This next set of steps will run aggregation tools over the refactored network. The process of aggregating to a Uniform Distribution. The first step in doing this is to remap the hydrolocations we enforces in the refactored fabric. With any refactor execution with hydrofabric::refactor a lookup table is produced that relates the original hydrofabric IDs to the new identifiers they became. A quick join can provide a mapping of hydrolocations to the refactored network. Passing these to the aggregate_* will ensure they are not aggregated over in the processing.

hydrolocations = read_sf(refactored_file, 'lookup_table') %>%
  inner_join(pois, by = c("NHDPlusV2_COMID" = "hf_id")) %>%
  select(poi_id, NHDPlusV2_COMID, id = reconciled_ID) %>%
  distinct()

head(hydrolocations)
## # A tibble: 6 × 3
##   poi_id NHDPlusV2_COMID    id
##    <int>           <dbl> <int>
## 1  37345         2899997     2
## 2  37014         2899553    10
## 3  36913         2900669    15
## 4  36920         2900581    24
## 5  36914         2900571    28
## 6  36664         2898115    44
aggregate_to_distribution(
  gpkg = refactored_file,
  hydrolocations = hydrolocations,
  ideal_size_sqkm = 10,
  min_length_km = 1,
  min_area_sqkm = 3,
  outfile = aggregated_file,
  overwrite = TRUE )
make_map(aggregated_file, pois)

Generate a NextGen Network

In order to make this hydrofabric compliant with the NextGen flowline-to-nexus topology and mapping, we’ll run the following over our aggregated network.

unlink(nextgen_file)
apply_nexus_topology(aggregated_file, export_gpkg = nextgen_file)
## [1] "tutorial/poudre_ng.gpkg"
hf = read_hydrofabric(nextgen_file)
                      
make_map(nextgen_file, read_sf(nextgen_file, "nexus"))

And there you have it! This is the minimal set of information needed in a NextGen hydrofabric!

Enriching the Network

Derive the divide-level data needed for CFE/NOM/PET

The catchments generated in the network preparation need land surface attributes before we can run hydrologic simulations over them. This sort of data typically includes things like soil type, average basin slope, and land cover. The easiest way to accomplish this is to use to use use climateR to access data, and zonal to rapidly summarize gridded data to the POLYGON scale. Both are core components of the NOAA-OWP/hydrofabric meta package and should already be loaded!

vsi <- "/vsis3/lynker-spatial/gridded-resources"
div <- read_sf(nextgen_file, "divides")

NOAH OWP Varibables

nom_vars <- c("bexp", "dksat", "psisat", "smcmax", "smcwlt")

r = rast(glue("{vsi}/nwm/conus/{nom_vars}.tif"), lyrs = seq(1,length(nom_vars)*4, by = 4))

modes = execute_zonal(r[[1]], 
                    fun = mode,
                    div, ID = "divide_id", 
                    join = FALSE)  %>% 
    setNames(gsub("fun.", "", names(.)))

gm = execute_zonal(r[[2:3]], 
                    fun = geometric_mean,
                    div, ID = "divide_id", 
                    join = FALSE)  %>% 
    setNames(gsub("fun.", "", names(.)))

m = execute_zonal(r[[4:5]], 
                    fun = "mean",
                    div, ID = "divide_id", 
                    join = FALSE)  %>% 
    setNames(gsub("mean.", "", names(.)))

d1 <- power_full_join(list(modes, gm, m), by = "divide_id")

GW Routing parameters

GW data comes form the routlink file (traditionally the GWBUCKPARM_CONUS_FullRouting.nc in the NWM)

crosswalk <- as_sqlite(nextgen_file, "network") |>
    select(hf_id, divide_id) |>
    collect()

d2 <- open_dataset(glue("{source}/v2.2/reference/conus_routelink")) |>
    select(hf_id , starts_with("gw_")) |>
    inner_join(mutate(crosswalk, hf_id = as.integer(hf_id)), by = "hf_id") |>
    group_by(divide_id) |>
    collect() |>
    summarize(
      gw_Coeff = round(weighted.mean(gw_Coeff, w = gw_Area_sqkm, na.rm = TRUE), 9),
      gw_Zmax_mm  = round(weighted.mean(gw_Zmax_mm,  w = gw_Area_sqkm, na.rm = TRUE), 9),
      gw_Expon = mode(floor(gw_Expon))
    )

Forcing Downscaling base data

Tools like CIROH ngen-datastream and NextGen in a Box require forcings which are downscaled using attribuets like a catchment centroid, mean elevation, slope, and aspect.

  • X, Y centroid
  • Mean elevation and slope
  • Circular mean of aspect.

X Y (for forcing downscaling)

d3 <- st_centroid(div) |>
  st_transform(4326) |>
  st_coordinates() |>
  data.frame() |>
  mutate(divide_id = div$divide_id)

Elevation data for Forcing downscaling and NOAH-OWP

dem_vars <- c("elev", "slope", "aspect")

r  <- rast(glue('{vsi}/250m_grids/usgs_250m_{dem_vars}.tif'))

d4 <- execute_zonal(r[[1:2]], 
                    div, ID = "divide_id", 
                    join = FALSE) |>
    setNames(c("divide_id", "elevation_mean", " slope"))

d5 <- execute_zonal(r[[3]], 
                     div, ID = "divide_id", fun = circular_mean, 
                     join = FALSE) |>
    setNames(c("divide_id", "aspect_c_mean"))
model_attributes <- power_full_join(list(d1, d2, d3, d4, d5), by = "divide_id")
  
write_parquet(model_attributes, model_atts_file)

Forcing Weight Grids

type = "medium_range.forcing"

w = weight_grid(rast(glue('{vsi}/{type}.tif')), div, ID = "divide_id") |> 
  mutate(grid_id = type)

head(w)

write_parquet(w, model_weights_file)

Extacting Cross Sections

hyperlink to bew 3D-vignette link to JOSS paper

crosswalk <- as_sqlite(nextgen_file, "network") |>
    select(hf_id, id, divide_id, hydroseq, poi_id) |>
    filter(!is.na(poi_id)) %>% 
    collect() %>% 
    slice_min(hydroseq)

open_dataset(glue("{source}/v2.2/reference/conus_routelink/")) |>
    select(hf_id, starts_with("ml_")) 
## FileSystemDataset (query)
## hf_id: int32
## ml_tw_inchan_m: double
## ml_tw_bf_m: double
## ml_y_inchan_m: double
## ml_y_bf_m: double
## ml_ahg_c: double
## ml_ahg_f: double
## ml_ahg_a: double
## ml_ahg_b: double
## ml_ahg_k: double
## ml_ahg_m: double
## ml_r: double
## ml_bf_channel_area_m2: double
## ml_inchan_channel_area_m2: double
## ml_bf_channel_perimeter_m: double
## ml_inchan_channel_perimeter_m: double
## ml_roughness: double
## ml_hf_source: string
## 
## See $.data for the source Arrow object
(cs <- open_dataset(glue("{source}/v2.2/reference/conus_routelink/")) |>
    select(hf_id, ml_y_bf_m, ml_tw_bf_m, ml_r) %>% 
    inner_join(mutate(crosswalk, hf_id = as.integer(hf_id)), by = "hf_id") |>
    collect() %>% 
    summarise(TW = mean(ml_tw_bf_m),
              r = mean(ml_r),
              Y = mean(ml_y_bf_m),
              poi_id = poi_id[1]))
## # A tibble: 1 × 4
##      TW     r     Y poi_id
##   <dbl> <dbl> <dbl> <chr> 
## 1  19.6  36.3  1.48 35836
bathy = AHGestimation::cross_section(r = cs$r, TW = cs$TW, Ymax = cs$Y) 

plot(bathy$x, bathy$Y, type = "l", 
     ylab = "Releative distance (m)", 
     xlab = "Depth (m)", 
     main = glue("Average XS at POI: {cs$poi_id}"))

library(plotly)

crosswalk <- as_sqlite(nextgen_file, "network") |>
    select(hf_id, id, toid, divide_id, hydroseq, poi_id) |>
    collect() %>% 
    slice_max(hydroseq)

cw = open_dataset(glue('{source}/v2.1.1/nextgen/conus_network')) %>% 
  semi_join(crosswalk, by = "hf_id") %>% 
  collect() 

message(sum(cw$lengthkm), " kilometers of river")

open_dataset(glue('{source}/v2.1.1/nextgen/conus_xs')) %>% 
  filter(vpuid %in% unique(cw$vpuid), hf_id %in% unique(cw$id)) %>% 
  group_by(hf_id, cs_id) %>% 
  collect() %>% 
  mutate(uid = cur_group_id()) %>% 
  plot_ly(x = ~X, y = ~Y, z = ~Z,  split = ~as.factor(uid),
          type = 'scatter3d', mode = 'markers+lines',
          line = list(width = 3), marker = list(size = 2)) %>% 
  layout(list(aspectmode='manual',
              aspectratio = list(x=100, y=100, z=1)),
              showlegend = FALSE)

Populate Flowpath Attributes

Slowly the ML enhanced DEM-based cross sections are being used to supplement a national Routelink file (conus_routelink) that is complete with the routing attributes need for both t-route and wrfhydro / NWM to execute. We are striving to implement the routwlink file at the reference fabric level meaning it can be expetend to any derived product. As such, the length average contribution of each reference flowline to its aggregated flowpath needs to be calculated. This can be done in the following way:

add_flowpath_attributes(nextgen_file, source = source)
## [1] "tutorial/poudre_ng.gpkg"
# Data
as_sqlite(nextgen_file, 'flowpath_attributes') %>% 
  collect() %>% 
  head()
## # A tibble: 6 × 13
##     fid id     rl_Qi_m3s rl_MusX   rl_n  rl_So rl_ChSlp rl_BtmWdth_m
##   <int> <chr>      <dbl>   <dbl>  <dbl>  <dbl>    <dbl>        <dbl>
## 1     1 wb-1           0     0.2 0.06   0.0197    0.517         3.91
## 2     2 wb-10          0     0.2 0.0565 0.0481    0.420         9.14
## 3     3 wb-100         0     0.2 0.06   0.0971    0.641         2.33
## 4     4 wb-101         0     0.2 0.06   0.064     0.634         2.39
## 5     5 wb-102         0     0.2 0.06   0.0726    0.628         2.45
## 6     6 wb-103         0     0.2 0.06   0.0552    0.679         2.05
## # ℹ 5 more variables: rl_Kchan_mmhr <dbl>, rl_nCC <dbl>, rl_TopWdthCC_m <dbl>,
## #   rl_TopWdth_m <dbl>, length_m <dbl>

Adding GPKG Symbology

append_style(nextgen_file, layer_names = c("divides", "flowpaths", "nexus"))
## [1] "tutorial/poudre_ng.gpkg"